Machine Learning (ML) in Bioinformatics

Supervised learning algorithms


image
Prerequisites: None.
Level: Beginner.
Learning objectives:
- Gain basic understanding what supervised learning is.

Welcome to this learning module on supervised learning algorithms! By studying this module, you will learn about the fundamentals of supervised learning and various algorithms that fall under this category. Before diving into the details of supervised learning algorithms, let's first understand what supervised learning is.

What is Supervised Learning?

Supervised learning is machine learning in which we train a model using labeled data, meaning that a corresponding correct output accompanies the input data, and the goal of the model is to learn the mapping function from the model's input to the output.

For example, consider a task where you must predict real estate prices based on size, age, and location. You are given a dataset with the size, age, and location of several real estates and corresponding prices. Using a supervised learning model, we use this dataset to train it to predict the price of real estate based on its size, age, and location.

In supervised learning, the model is trained on a labeled dataset and then tested on another labeled dataset to evaluate its performance. The model's accuracy is determined based on how well it can predict the output for the unseen data.

Now that you have an essential basic understanding of supervised learning, let's take a look at the different types of supervised learning algorithms.

Types of Supervised Learning Algorithms

The following are two principal types of supervised learning algorithms:

Classification algorithms:
These algorithms are used to predict a discrete outcome, such as a Yes/No or a 0/1. For example, a classification algorithm can predict whether customers will churn based on their past interactions with the company.
Regression algorithms:
These algorithms predict a continuous outcome, such as a real number. For example, a regression algorithm can predict real estate prices based on size, age, and location.

Let's now take a closer look at some popular supervised learning algorithms.

Here are some popular supervised learning algorithms:

Linear regression:
Linear regression is an uncomplicated and widely used regression algorithm that tries to find the straight-line relationship between the input and output variables. It is used to predict a continuous outcome, such as the price of a house.
Logistic regression:
A classification algorithm predicts a binary outcome (Yes/No or 0/1). It is used to predict the probability of an event occurring, such as whether a customer will churn or not.
Decision tree:
This model is tree-like and makes decisions based on specific rules. It predicts a discrete outcome, such as whether a customer will churn.
Random forest:
A random forest is a learning algorithm comprising multiple decision trees. It improves the model's accuracy by aggregating the predictions of multiple decision trees. This algorithm is an "ensemble algorithm."
Support vector machine (SVM):
SVM is a classification algorithm that uses a line or hyperplane to separate different classes. It predicts a discrete outcome, such as whether a customer will churn.

These are just a few examples of supervised learning algorithms. There are many more algorithms that you can learn about, such as k-nearest neighbors (KNN), artificial neural networks (ANN), and gradient boosting.

The Key Point to Remember

Supervised learning is machine learning in which the model is trained on labeled data.